HITS' Graph-based System at the NTCIR-9 Cross-lingual Link Discovery Task
نویسندگان
چکیده
This paper presents HITS’ system for the NTCIR-9 crosslingual link discovery task. We solve the task in three stages: (1) anchor identification and ambiguity reduction, (2) graphbased disambiguation combining different relatedness measures as edge weights for a maximum edge weighted clique algorithm, and (3) supervised relevance ranking. In the fileto-file evaluation with Wikipedia ground-truth the HITS system is the top-performer across all measures and subtasks (English-2-Chinese, English-2-Japanese and English-2Korean). In the file-2-file and anchor-2-file evaluation with manual assessment, the system outperforms all other systems on the English-2-Japanese subtask and is one of the top-three performing systems for the two other subtasks.
منابع مشابه
Multi-filtering Method Based Cross-lingual Link Discovery
This paper describes cross-lingual link discovery method of ISTIC used in the system evaluation task at NTCIR-9. In this year's evaluation, we participated in cross-lingual link discovery task from English to Chinese. In this paper, we mainly describe our understanding for CLLD, the key techniques of our system, and the evaluation results.
متن کاملOverview of the NTCIR-10 Cross-Lingual Link Discovery Task
This paper presents an overview of NTCIR-10 Cross-lingual Link Discovery (CrossLink-2) task. For the task, we continued using the evaluation framework developed for the NTCIR-9 CrossLink-1 task. Overall, recommended links were evaluated at two levels (file-to-file and anchor-to-file); and system performance was evaluated with metrics: LMAP, R-Prec and P@N.
متن کاملOverview of the NTCIR-9 Crosslink Task: Cross-lingual Link Discovery
This paper presents an overview of NTCIR-9 Cross-lingual Link Discovery (Crosslink) task. The overview includes: the motivation of cross-lingual link discovery; the Crosslink task definition; the run submission specification; the assessment and evaluation framework; the evaluation metrics; and the evaluation results of submitted runs. Cross-lingual link discovery (CLLD) is a way of automaticall...
متن کاملAutomated Cross-lingual Link Discovery in Wikipedia
At NTCIR-9, we participated in the cross-lingual link discovery (Crosslink) task. In this paper we describe our approaches to discovering Chinese, Japanese, and Korean (CJK) cross-lingual links for English documents in Wikipedia. Our experimental results show that a link mining approach that mines the existing link structure for anchor probabilities and relies on the “translation” using cross-l...
متن کاملAn evaluation framework for cross-lingual link discovery
Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case wit...
متن کامل